In [3]:
In [4]:
Out[4]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 892 0 3 Kelly, Mr. James male 34.5 0 0 330911 7.8292 NaN Q
1 893 1 3 Wilkes, Mrs. James (Ellen Needs) female 47.0 1 0 363272 7.0000 NaN S
2 894 0 2 Myles, Mr. Thomas Francis male 62.0 0 0 240276 9.6875 NaN Q
3 895 0 3 Wirz, Mr. Albert male 27.0 0 0 315154 8.6625 NaN S
4 896 1 3 Hirvonen, Mrs. Alexander (Helga E Lindqvist) female 22.0 1 1 3101298 12.2875 NaN S
In [5]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 418 entries, 0 to 417
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  418 non-null    int64  
 1   Survived     418 non-null    int64  
 2   Pclass       418 non-null    int64  
 3   Name         418 non-null    object 
 4   Sex          418 non-null    object 
 5   Age          332 non-null    float64
 6   SibSp        418 non-null    int64  
 7   Parch        418 non-null    int64  
 8   Ticket       418 non-null    object 
 9   Fare         417 non-null    float64
 10  Cabin        91 non-null     object 
 11  Embarked     418 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 39.3+ KB
In [6]:
Out[6]:
PassengerId Survived Pclass Age SibSp Parch Fare
count 418.000000 418.000000 418.000000 332.000000 418.000000 418.000000 417.000000
mean 1100.500000 0.363636 2.265550 30.272590 0.447368 0.392344 35.627188
std 120.810458 0.481622 0.841838 14.181209 0.896760 0.981429 55.907576
min 892.000000 0.000000 1.000000 0.170000 0.000000 0.000000 0.000000
25% 996.250000 0.000000 1.000000 21.000000 0.000000 0.000000 7.895800
50% 1100.500000 0.000000 3.000000 27.000000 0.000000 0.000000 14.454200
75% 1204.750000 1.000000 3.000000 39.000000 1.000000 0.000000 31.500000
max 1309.000000 1.000000 3.000000 76.000000 8.000000 9.000000 512.329200
In [7]:
Out[7]:
count unique top freq
Name 418 418 Kelly, Mr. James 1
Sex 418 2 male 266
Ticket 418 363 PC 17608 5
Cabin 91 76 B57 B59 B63 B66 3
Embarked 418 3 S 270
In [8]:
Out[8]:
PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age             86
SibSp            0
Parch            0
Ticket           0
Fare             1
Cabin          327
Embarked         0
dtype: int64
In [9]:
Out[9]:
<Axes: >
In [10]:
Out[10]:
(0, 12)
In [11]:
Out[11]:
0
In [12]:
In [13]:
Out[13]:
['PassengerId', 'Survived', 'Pclass', 'Age', 'SibSp', 'Parch', 'Fare']
In [14]:
In [15]:
In [16]:
In [17]:
Out[17]:
PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age              0
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          327
Embarked         0
dtype: int64
In [18]:
In [19]:
Out[19]:
PassengerId    0
Survived       0
Pclass         0
Name           0
Sex            0
Age            0
SibSp          0
Parch          0
Ticket         0
Fare           0
Cabin          0
Embarked       0
dtype: int64
In [20]:
In [21]:
Out[21]:
<Axes: >
In [22]:
<Figure size 700x500 with 0 Axes>
In [23]:
C:\Users\Hp\AppData\Local\Temp\ipykernel_11044\2921346563.py:1: UserWarning:



`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751


In [24]:
upper limit:  203.3165847121267
lower limit:  -131.89132155423198
In [25]:
Out[25]:
(18, 12)
In [26]:
before removing the outliers:  418
after removing the outliers:  400
the outliers:  18
In [27]:
<Figure size 700x500 with 0 Axes>
In [28]:
C:\Users\Hp\AppData\Local\Temp\ipykernel_11044\2817775539.py:1: UserWarning:



`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751


In [29]:
In [30]:
<Figure size 1200x1000 with 0 Axes>
In [31]:
<Figure size 1200x1000 with 0 Axes>
In [32]:
<Figure size 1200x1000 with 0 Axes>
In [34]:
<Figure size 1200x1000 with 0 Axes>
In [38]:
In [39]:
In [40]:
In [47]:
UsageError: Line magic function `%tensorflow_version` not found.
In [50]:
In [51]:
In [53]:
In [54]:
In [56]:
In [57]:
In [59]:
Out[59]:
Pclass Sex Age Fare
0 3.0 male 34.5 7.83
1 3.0 female 47.0 7.00
2 2.0 male 62.0 9.69
3 3.0 male 27.0 8.66
4 3.0 female 22.0 12.29
... ... ... ... ...
413 3.0 male 22.5 8.05
414 1.0 female 39.0 108.90
415 3.0 male 38.5 7.25
416 3.0 male 22.5 8.05
417 3.0 male 26.5 22.36

418 rows × 4 columns

In [60]:
C:\Users\Hp\AppData\Local\Temp\ipykernel_11044\3817471325.py:3: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Out[60]:
Pclass Sex Age Fare
0 3.0 1 34.5 7.83
1 3.0 0 47.0 7.00
2 2.0 1 62.0 9.69
3 3.0 1 27.0 8.66
In [61]:
In [62]:
Out[62]:
DecisionTreeClassifier(criterion='entropy', max_depth=3)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
In [63]:
Out[63]:
1.0
In [64]:
C:\Users\Hp\anaconda3\Lib\site-packages\sklearn\base.py:464: UserWarning:

X does not have valid feature names, but DecisionTreeClassifier was fitted with feature names

Out[64]:
array([0.])
In [65]:
C:\Users\Hp\anaconda3\Lib\site-packages\sklearn\base.py:464: UserWarning:

X does not have valid feature names, but DecisionTreeClassifier was fitted with feature names

Out[65]:
array([[1., 0.]])
In [ ]: